IBM HR Analytics

IBM HR Analytics Employee Attrition and Performance Dataset

In this study, we analyze HR data available from kaggle.com. This data is fictional and it is created by IBM data scientists.

Categorical Parameters:

1 2 3 4 5
Education Below College College Bachelor Master Doctor
Environment Satisfaction Low Medium High Very High
Job Involvement Low Medium High Very High
Job Satisfaction Low Medium High Very High
Performance Rating Low Good Excellent Outstanding
Relationship Satisfaction Low Medium High Very High
WorkLife Balance Bad Good Better Best

This can be encoded as follows,

Loading the Dataset

First off, let's take a look at the dataset

Moreover,

Preprocessing

We also need to convert categorical data to numeric data.

We can use LabelEncoder for converting categorical to numeric using. Therefore,

Moreover,

Variance of the Features

Features with variance zero

First, we remove features that have zero variance as these features don't add anything to our modeling.

Features with high variance

Moreover, high variance for some features can hurt our modeling process. For this reason, we would like to standardize features by removing the mean and scaling to unit variance. In this article, we demonstrated the benefits of scaling data using StandardScaler().

Feature Correlation

Saving


References

  1. Kaggle Dataset: IBM HR Analytics Employee Attrition & Performance